Intelligent tit-for-tat in the iterated prisoner's dilemma game 
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We seek a route to the equilibrium where all the agents cooperate in the iterated prisoner's 
dilemma game on a two-dimensional plane, focusing on the role of tit-for-tat strategy. When a time 
horizon, within which a strategy can recall the past, is one time step, an equilibrium can be achieved 
as cooperating strategies dominate the whole population via proliferation of tit-for-tat. Extending 
the time horizon, we filter out poor strategies by simplified replicator dynamics and observe a similar 
evolutionary pattern to reach the cooperating equilibrium. In particular, the rise of a modified tit- 
for-tat strategy plays a central role, which implies how a robust strategy is adopted when provided 
with an enhanced memory capacity. 
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I. INTRODUCTION 



One of the main interests in statistical physics is re- 
lated with the equilibration process of a given system 
composed of many interacting elements. For instance, 
the classical Ising system made up of locally interact- 
ing spins approaches an equilibrium, characterized by the 
minimum of the Hclmholtz free energy. Such a model sys- 
tem in statistical physics is defined by the Hamiltonian 
and can be readily studied by updating spins with local 
Monte Carlo rules in numerical simulations 1]. A lot 
of interactions including ecological, social, and economi- 
cal ones are more complicated than that of spins as they 
are usually asymmetric and history dependent. Further- 
more, most of these systems beyond the simple physics 
model cannot be described by the simple Hamiltonian 
approach. Even if no analytical solution is available, we 
may expect the system to evolve by successive local adap- 
tations, with searching for an optimal point on the fitness 
landscape. However, when the interaction is asymmet- 
ric, it is possible that the equilibrium reached by local 
dynamics may not be optimal in a global sense. 

The prisoner's dilemma (PD) game is a famous model 
of such disparity. The typical story begins as follows. 
Two suspected accomplices are caught by the police for 
a crime deserving of 4 years' imprisonment each. After 
separating two suspects from each other, the police of- 
fers a deal to each of them: If only one confesses the 
crime and the other remains silent, the informer will be 
rewarded and set free, while the other one will receive 
an aggravated punishment (say 5 years in prison). On 
the other hand, if both keep silent, they will get some 
punishment which is supposed to be not so heavy (e.g., 
2 years in prison). It is still true that they can get light 
punishments by cooperating to each other. From an in- 
dividual viewpoint, however, it is always better to defect 
the other, so they will be eventually sentenced 8 years in 
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total as the police wanted. Throughout the present pa- 
per, the game results are quantified by four elementary 
payoffs: The temptation to defect as T = 5, the reward 
of cooperation as _R = 3, the punishment from mutual 
defection as P = 1, and the damage from being sucked 
as 5 = 0. Note that the payoffs satisfy two inequalities. 
The first one T > R > P > S locates the Nash equilib- 
rium [4] at mutual defection, and the second 2R > T + S 
sets the mutual cooperation as optimal in total. 

The conclusion of the PD game is highly nontrivial 
in that local optimization will end up with the poorest 
result in a global sense. The first breakthrough in this 
dilemma was made by performing the game iteratively, 
where the system could achieve the optimal point of mu- 
tual cooperation Q. Iteration affects the system's tra- 
jectory in two ways: Since the strategy space comes to 
have a much larger dimensionality than choosing between 
cooperation and defection, there enters a possible route 
to mutual cooperation. In addition, as the time scale of 
interaction is separated from that of selection, the sta- 
bility of equilibria and their basins of attraction may be 
changed: As to the PD game, for example, slow selec- 
tion favors the weaker strategy (i.e., cooperation) from a 
population genetics point of view [j. Nevertheless, the 
equilibrium in which all agents cooperate is usually ac- 
cessed by a detour consisting of intermediate stages. 

In the iterated PD game, there are successfully coop- 
erating strategies some of which are as follows: (i) Grim 
trigger (GT) initially cooperates, but any single defection 
by its opponent makes GT defect forever Q. (ii) tit-for- 
tat (TFT) also starts with cooperation, and then does 
what the opponent did. This simple strategy is famous 
for its own virtues, i.e., being nice, retaliating, forgiving, 
and nonenvious [3|. By nice, we mean that a strategy 
never provokes the opponent first by defection. Likewise, 
retaliating and forgiving mean that it defects after de- 
fected, and cooperates when the opponent changes back 
to cooperation. Finally, by being nonenvious, TFT al- 
lows coexistence of other strategies. However, one should 
note that an erroneous defection between TFT's leads to 
a chain retribution until a new error makes them cooper- 
ate again, (iii) Pavlov keeps its last move if paid highly 



and switches to a different move otherwise, as it is of- 
ten cahed win-stay lose-shift [y]. UnUke the other two, it 
forgives a mistake between themselves. 

Since the above-mentioned three strategies remember 
only the moves in the last time step, they all belong to 
a set of strategies which are confined in the time hori- 
zon of one time step, which we will call Mi. Note that 
the actual amount of information in use is different: GT 
and TFT require only the opponent's last move, while 
Pavlov recalls both of its opponent's and its own. Like- 
wise, M„ means the set of strategies which uses the last 
k{< n) time steps in making a decision. By giving an ex- 
plicit restriction to the time horizon, our strategy space 
is different from that in the state space approach . 

In order to investigate how the system is evolved by 
selection and adaptation, we start with every possible 
strategy in Mi and A/2, respectively, and examine sur- 
viving strategies to understand the route to the equilib- 
rium. While the genetic algorithm has been often used 
in exploring a large strategy space [a, 101 j we aim at an 
almost exhaustive search in that all the strategies are 
explicitly considered at least once. In particular, we do 
not include any mutation processes as in Ref. [10| for fix- 
ing the strategy space we must scan. Nor do we treat 
stochastic strategies [li|, [l2, E, [lJ| , as the deterministic 
representation shows the pure decision characteristics of 
a strategy more clearly. Note that we mostly employ typ- 
ical setups except for the time horizon in order to keep 
the situation as simple as possible. We therefore pass 
over many interesting variations of the PD game, such 
as the idea of payoff-based strategies [l^- The spatial 
structure we study here is a two-dimensional plane which 
provides spatial reciprocity for cooperators 16] (see, e.g., 
Refs. [13, [13 fo'^ other topological structures), but we do 
not employ the dynamic preferential selection [l9| and 
let each agent play with its every neighbor equally. Un- 
der such conditions, we find that M2 has its own TFT 
modified from the original one in Mi, which seemingly 
indicates a generic pattern in the evolution of coopera- 
tion. Even though the reciprocity has been thought of as 
relevant to the emergence of cooperation even in longer 
time horizons Q, such concrete strategic forms, which 
are directly related to the original TFT, have not been 
reported yet. 

The present paper is organized as follows: In Sec. [TTl 
we check the case of Mi to introduce our basic scheme. 
In Sec. Illli we apply it to M2 and present the surviving 
strategies, including the modified type of TFT. Finally, 
we discuss and conclude this work in Sec. IIVI 



II. METHODS 
A. Bitwise representation of strategies in Mi 

A strategy in Mi can be conveniently denoted by five 
bits, each of which can take either cooperation (C) or de- 
fection {D): The first bit, a, is the move when a player 



TABLE I: Bitwise representation of a strategy in Mi as 
a\aia2a-iai. 

State Empty (C,C) (0,0) {D,C) {D,D) 

Player's move a ai 02 as a4 



first encounters an opponent and thus has empty mem- 
ory. The bit ai is the move at time t when the player's 
and opponent's previous moves at i — 1 were C and C 
[henceforth we denote this situation as (player's move at 
t — 1, opponent's move at i — 1) = (C, C)] , respectively 
(Table m. Likewise, 02 is for {C,D), 03 for {D,C), and 
a4 for {D,D). 

Consequently, a strategy in Mi is coded by a 101020304 
and the total number of strategies is \Mi\ — 2^ ^ 32, for 
each of five bits can have either C or D. For exam- 
ple, C\CDDD, C\CDCD, and C\CDDC encode GT, 
TFT and Pavlov, respectively. Further examples include 
the unconditional cooperator (ALLC or AC) coded by 
C\CCCC' and the unconditional defector (ALLD or AD) 
by D\DDDD. A nice strategy (see above) in Mi is rep- 
resented as a = C, implying that it starts with C at 
the first encounter, and oi = C, meaning that it never 
provokes the defection first [23 . 



B. Transition graphs and tournament for Mi 

Another way of representing a strategy is to mention 
all of the possible states it may meet and all of the possi- 
ble transitions between them [^, [20, l2l| . Identifying each 
state with a vertex and each transition with an arc (a 
directed edge), with self-connecting included, this pro- 
cedure yields a transition graph for each strategy. Sup- 
pose, for example, that Alice employs TFT and Bob does 
another arbitrary strategy in Mi. From Alice's view- 
point, the four possible states are represented by four 
pairs; {C,C), {C,D), {D,C), and {D,D) where the for- 
mer character indicates her last move and the latter does 
Bob's. If starting with (C, C), the next state must be 
{C,X) with X = C OT D depending on Bob's strategy, 
because Alice remembers what Bob did at the last en- 
counter. Repeating this for all the states gives Fig. [Tj 
One can easily get the graphical representations for any 
other strategies by the same procedure, noting that the 
initial bit a does not change the transition graph but 
only makes the starting vertex in the graph different. 

From all the 16 transition graphs in Mi, TFT is found 
to be unique in that it does not permit returning to 
{C,D) without visiting {D,C), which implies that any 
strategy cannot repeatedly suck TFT avoiding retalia- 
tion. In order to describe the time course of the PD 
game between two agents, the distinction between tran- 
sient and recurrent states needs to be made: Transient 
states have only outward arcs and thus cannot be vis- 
ited repeatedly, while the recurrent states are visited over 
and over again. For example, TFT does not have tran- 



c,c ; * [ D,c 




FIG. 1: Transition graph for TFT. Each vertex represents a 
state in Table U and the directed edges are the possible next 
states allowed by this strategy. Each vertex has two outgoing 
arcs, considering the move taken by the opponent. 




FIG. 2; Transition graphs from combining two strategies in 
Ml. (a) GT vs GT. If deviated from (C, C), the only attractor 
is mutual defection, (b) TFT vs TFT. If mistaken, they do not 
recover mutual cooperation on their own, unless another error 
brings them back, (c) Pavlov vs Pavlov, forgiving an error 
between themselves, (d) Pavlov vs GT. Pavlov is defeated by 
GT if any error occurs. 



sient states, while AD has two transient states (C, C) and 
{C,D) with two recurrent states {D,C) and {D,D). 

If two strategies i and j in Mi play the PD game to- 
gether, two corresponding graphs are combined to make 
one deterministic transition graph (Fig. [5]). The move 
sequence is periodic and the long-time hmit of the aver- 



age payoff per time step is determined only by recurrent 
states of the two, from which one can calculate easily Uij, 
the average payoff per step that the strategy i gains from 
j. In the same spirit as the original tournament held by 
Axelrod, we compute the average points the strategy i 
gets from all strategies (including i) to obtain Table [III 
So far as each pair of strategies has an equal acquain- 
tance probability, the tournament results will converge 
to these values in the long-time limit. Moreover, since 
each value in this table is analytically calculated from pe- 
riodic moves in pairs of deterministic strategies, one can 
decompose it into the elementary payoffs, T, R, P, and 
S. For example, AD, AC, and GT earn (T + P)/2 = 3, 
{R + S)/2 = 1.5, and R/2 + (T + P)/A = 3, respec- 
tively. One can see that TFT is not the best within Mi 
and that strategies with more D bits often outperform 
cooperators. We emphasize that the above results in a 
round-robin tournament are not related to an evolution- 
ary process yet and need to be checked from evolutionary 
perspectives. 



C. Spatial prisoner's dilemma game for M\ 

The spatial PD game (SPDG) provides a good frame- 
work for observing the emergent cooperation as it allows 
the cooperating strategies to make clusters against de- 
fectors Ji|]. There is no unique standard in constructing 
SPDG, and a different rule may yield a different output, 
in general. Here we present our SPDG rules, which have 
been extensively used in literature [l8|. 

We perform SPDG on a two-dimensional 128 x 128 
square lattice with the periodic boundary condition. In 
the initial stage of the SPDG, one among all 32 strategies 
in Ml is randomly assigned to each node of the lattice, 
and every agent plays the PD game with her four nearest 
neighbors. After all agents play the game, often called 
one Monte Carlo (MC) step, this procedure is stopped 
with a preassigned probability p or repeats itself with 
1 — p. When stopped, the sequence of games so far is 
termed as one generation whose average time duration 
is 1/p MC steps. In order to make the effects of tran- 
sient states (see above) as weak as possible, p should be 
sufficiently small to ensure that one generation is long 
enough (we observe that p = 0.05, corresponding to one 
generation as 20 MC steps on average, fulfills this require- 
ment). Whenever a generation is closed, the selection 
mechanism is activated as follows: Every node, one by 
one, randomly chooses one of its nearest neighbors and 
adopts the neighbor's strategy if the neighbor has gained 
more during that generation. Memory tables for all pairs 
of agents are recalculated and payoffs are initialized back 
to zero, and then the next generation begins. 

Our SPDG simulation readily shows that a cooperating 
equilibrium, in which all of the agents are playing C, 
is achieved mostly by GT, TFT, and Pavlov, together 
with a minor strategy C\CDCC [Fig.[3ta)]. It is notable 
that these surviving four strategies are, in fact, the four 



TABLE II: Average points jjj-r J]] Uij for each strategy in Mi. 



Strategy 



Points 



Strategy 



Points 



Strategy 



Points 



Strategy 



Points 



AD 

D\CCDD 

C\DDDD 

GT 

D\CDDD 

D\DCDD 

D\DDDC 

DICDDC 



3.00 
3.00 
3.00 
3.00 
3.00 
3.00 
2.97 
2.89 



C\DDDC 


2.73 


D\DCDC 


2.73 


Pavlov 


2.56 


D\CCDC 


2.38 


C\DDCD 


2.36 


TFT 


2.35 


D\DDCC 


2.25 


CIDCDD 


2.25 



C\DDCC 
D\DDCD 
C\CDCC 
D\CDCC 
D\CDCD 
C\DCDC 
D\DCCD 
DIDCCC 



2.25 
2.22 
2.19 
2.09 
2.02 
1.90 
1.86 
1.81 



C\DCCD 


1.69 


C\DCCC 


1.63 


AC 


1.50 


C\CCDC 


1.50 


C\CCCD 


1.50 


C\CCDD 


1.50 


D\CCCC 


1.50 


DICCCD 


1.38 
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FIG. 3; (Color online) Comparison between SPDG and RD. 

(a) A simulation result on a 128 x 128 lattice with p = 0.05. 

(b) Numerical integration of Eq. ((2)1 . 



possibilities when we fix the nice bits (a — C,ai — C) 
and the retaUating bit (02 = D). This implies that the 
virtues of TFT (see above) are indeed very important 
conditions for a strategy to be evolutionarily successful. 
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FIG. 4: (Color online) A pattern in SPDG on the 128 x 128 
lattice with e = 0.01 and p = 0.05. Even with the presence of 
error, almost all the dynamical patterns at large time scales 
occur within the strategies found in the error- free RD, if the 
error probability is sufficiently low. 



D. Replicator dynamics and filtering 

As the direct SPDG often requires an amount of com- 
putation, we by pas s the problem using the replicator 
dynamics (RD) [22] with average payoffs [23, [2J, [25j : 
Once the average payoffs are obtained from the transition 
graphs, the time evolution of the fraction of each strategy 
can, phenomenologically but conveniently, be described 
by RD within the assumption of the full mixing, corre- 
sponding to the mean-field approximation. 

Suppose that we perform SPDG with randomly dis- 
tributed strategies in a two-dimensional L x L square 
lattice with the total number of agents Na = L^ ■ Since 
each agent plays the game with 2; = 4 nearest neighbors, 
the expected gain that an agent with the strategy i col- 
lects, within the assumption of a full mixing, is written 
as 



Ui = y^^zUij(t>j, 



(1) 



where (j)j is the fraction defined as the number of agents 



of the strategy j divided by Na with J^i 4'i = '^i and 
Uij is the above-mentioned average gain i gets from j. 
If the relative growth rate of a strategy is proportional 
to its relative payoff deviated from the average over the 
whole population, we may write an ordinary differential 
equation 



'-^-W'-^":^ 



3V] 



(2) 



which is called the replicator dynamics. Note that if each 
strategy forms a cluster, the summation over the nearest 
neighbors of i cannot cover the whole space, and we must 
examine what happens near the interfaces. 

Although the RD description is more crude than the 
actual SPDG with local interactions, we find that the 
numerical integration of RD is surprisingly similar to 
what SPDG yields with the random initial distribution 
of strategies. In Fig. [2Ib), it is displayed that the four 
nice strategies of GT, TFT, Pavlov, and C\CDCC sur- 
vive just like the previous observation made for SPDG. 
Furthermore, the order of relative fractions of the four 
is identical in both results. Note that these four strate- 
gies are indistinguishable at this stage, because the bits 
other than oi are not actually used any more. In order to 
slightly activate those bits and check how the surviving 
strategies behave in the presence of erroneous decisions, 
we allow each player in SPDG to make mistakes at a 
given probability e. For example, e — 0.01 means that an 
agent's memory on a neighbor's last move may be flipped 
from C to D or D to C, once in 100 moves on average. 
Depending on the initial condition, various steady-state 
configurations are obtained. In many cases, however, we 
find that Pavlov eventually conquers the whole territory, 
defeating TFT [lj|, under such a low error rate (Fig. [5]). 

It is important that the error-free RD equilibrium 
selects out the long run strategies which appear in 
SPDG [2^. The dynamics among these strategies are 
driven by errors in much larger time scales than the fast 
extinctions. When e <C 1, the difference in these two 
time scales makes it possible to separate the fast extinc- 
tions from the long run behaviors. We point out that this 
selection can be further simplified, considering that each 
strategy occupies only a small fraction at the early stages 
and that the strategy with the least payoff decreases most 
rapidly. That is, the least fit strategy will be shortly re- 
moved from the population in effect, and the remainder's 
payoffs are rectified accordingly. Eliminating the least fit 
actually reaches the same cooperating equilibrium with 
the minimal number of computations, and it works sim- 
ilarly to the technique called the iterated elimination of 
dominated strategies [2y|. This procedure will be de- 
noted as RD filtering since it is based on a fundamental 
assumption of RD that the growth rate of a species is 
proportional to its payoff. After it simulates the initial 
short times until reaching an equilibrium, we come back 
to SPDG and consider the slow dynamics due to errors 
among survivors. Nevertheless, we stress that this pro- 



cedure is only a rough approximation and one should be 
careful not to expect general coincidence between them. 
Based on the numerical support in Mi , we are suggesting 
that this procedure can be regarded as a criterion that 
a feasible strategy is supposed to pass, rather than as 
a precise equivalent of SPDG. One obvious drawback is 
that it precludes much of the possibility of cyclic behav- 
iors allowed by the continuous RD 27], as we give the 
least fit no chance to return back (via, e.g, mutations) 
once removed. 



III. APPLICATION TO Ma 

A. Approach to memory effects 

Let us proceed to the study of the strategies in M2 to 
examine the effects of memory capacity in evolution. In 
order to decide the move at time t, an agent needs to 
remember her own moves and the opponent's moves at 
t — 1 and t — 2, respectively, corresponding to 2^ = 16 bits. 
Until the agent meets the opponent more than once, the 
past information is not yet available and thus the strategy 
should specify the moves for this case with two more bits 
for the initial two encounters. Accordingly, the number 
of strategies in M2 is counted as IM2I = 2^^+'^ = 262 144. 
Based on the previous results for Mi, we use the same 
method to filter out unsuccessful strategies in an early 
stage, and then play SPDG only for surviving strategies. 



B. RD filtering 

In the filtering procedure for M2, we again calculate 
Uij in the same way as before and use the mean-field 
payoff function in Eq. ([1]), assuming the full mixing. This 
reflects the fact that the initial strategies are randomly 
distributed and the number of remaining strategies turns 
out to be large enough to neglect clustering effects even 
at the equilibrium. The iterated elimination stops when 
no more strategies can be removed. 

During 1.4 x 10^ steps to reach the goal, we record 
the number of remaining strategies N and their expected 
payoffs U ranged over [C/min, C^max]- For comprehension, 
this range is divided by N at each step in Fig.[5l Both of 
Umin/N and C/max/A^ eventually shrinks to a single point 
at 3.0, indicating that all of the strategies obtain R = 3 
from mutual cooperation. From the concave shape of 
Uniax/N, we see two eras: Roughly before three-quarters 
of the whole period, Umax/N decreases by removing the 
least fit, as the top-ranked strategies exploit naive coop- 
erators. After the prey is consumed out, however, they 
become the next victims. Removing defectors now en- 
hances the degree of cooperation and C/max/A^ rises up 
as well. There are observed two great extinctions in 
that 2^^ = 4096 strategies disappear simultaneously at 
the 1424th and 124 910th steps, respectively. These are 



TABLE III: Strategy table for M2. 



State'' 


n {C/Df 


EC^ 


ET"* 


I-TFT'^ 


State 


n (C/D) 


EC 


ET 


I-TFT 


iCC,CC) 


100/0 


C 


C 


C 


{DCfiC) 


50/50 


D 


D 


c 


{CC,CD) 


42/58 


C 


C 


D 


[DC,CD) 


45/55 


- 


- 


D 


{CC,DC) 


52/48 


D 


D 


C 


{DC,DC) 


50/50 


C 


D 


C 


{CC,DD) 


6/94 


- 


- 


D 


(DC,DD) 


47/53 


- 


- 


C 


{CD,CC) 


54/46 


C 


C 


D 


[DD,CC) 


52/48 


- 


- 


C 


(CD,CD) 


48/52 


C 


c 


_ f 


{DD,CD) 


47/53 


- 


- 


C 


{CD,DC) 


56/44 


- 


- 


C 


{DD,DC) 


50/50 


- 


- 


C 


{CD,DD) 


31/69 


- 


- 


D 


{DD,DD) 


53/47 


- 


- 


D 



'^ A state {XiX2,YiY2) means that Xi and X2 (Yi and Y2) are player's (the opponent's) moves at two subsequent times, 

respectively. 

'' Percentages of C and D in Q, the set of the remaining strategies after RD filtering. 

'^ Efficient cooperator's moves at each given state. 

'^ Efficient trigger's moves. Although similar to EC's, this does not follow EC at {DC, DC) but defects it. 

° I-TFT's moves. The moves at the recurrent states are underlined. 

' If both of C and D are observed, the move is written as blank. 
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FIG. 5: (Color online) Filtering procedure on M2. It takes 
about 1.4 X 10^ steps to reach a cooperating equilibrium, (a) 
The remaining fraction N/Nt , where A'^ is the number of sur- 
vivors and Nt — 2^® — 262 144. Insets show the two great 
extinction events at the 1424th and 124 910th steps, respec- 
tively, (b) The maximum and minimum values of the payoffs, 
divided by N. The straight line represents R — 3, the reward 
for mutual cooperation. Insets and arrows are for the great 
extinction events again. 



symbolic of two eras, because AC is taken off at the first 
extinction and AD is at the second. 

After completing the filtering procedure, we find an 
equilibrium, where 12 944 surviving strategies constitute 
a set ri. The number is still large but only about 5% of 
I A/2 1 . There are two properties in this set: (i) All strate- 
gies in f2 are nice in the sense that they never defect first, 
(ii) After defected at the last two steps, about 94% of the 
surviving strategies choose to retaliate. It is also remark- 
able that GT and TFT are included in fl but Pavlov is 
dropped out. 



C. Spatial Prisoner's Dilemma Game for M2 

We next perform SPDG with e > for il on a two- 
dimensional 500 X 500 lattice in the same manner as we 
did for Ml. Note that the lattice size is almost 20 times 
greater than the number of strategies, which turns out to 
be enough to find recognizably common patterns. After 
2.4 X 10^ generations, most strategies in il also disappear 
and the number of survivors is usually less than 10 in 
each realization (Table HlH) . 

First, we observe two strategies with only eight re- 
current states per each. They are named as intelligent- 
TFT (I-TFT) in common, because TFT is embedded as 
an attractor in their recurrent states and the transient 
states are activated only when an error occurs [Fig.[6lja)]. 
Again, the state {XtXt+i, YtYt+i) represents that Xt and 
Xt+i {Yt and Yt+i) are the player's (the opponent's) 
moves at two subsequent times {X,Y = C or D), which 
is connected to {Xt+iXt+2,Yt+iYt+2) by a directed arc. 
Without errors, they are ordinary TFT and never sucked 
repeatedly by any other strategies. With errors, on the 
other hand, they return back to mutual cooperation with- 
out the chain retribution between themselves, overcom- 
ing the weakness of the classical TFT. Furthermore, this 





FIG. 6: (Color online) Graphical representations of surviving strategies in M2. (a) The full transition graph for I- TFT. Only 
the black vertices are recurrent states, while others are transient (see also Fig.[T]). The dashed arcs indicate the paths activated 
when an error occurs between I-TFT's. Since two strategies acting as I-TFT are found, we describe the duality in the graphs 
by the dotted arcs connected to two {CD, CD) (at the top-left and the bottom), (b) Parts of transition graphs characterizing 
EC and ET. While an EC-typed strategy tries to recover mutual cooperation {CC, CC) from an erroneous state (CC, CD) by 
the dashed lines, an ET-typed strategy repeatedly defects it by the dotted line. 



error tolerance is secured from repeated abuse by being 
transient. We therefore conclude that the only way to 
defeat I-TFT is more efficient cooperation than I-TFT's. 
As long as the error occurs rarely enough not to disturb 
its recovery path, I-TFT will clear the defecting strate- 
gies out and eventually make way for better cooper ators. 

Such efficient cooperators are characterized by the way 
of dealing with an error between themselves, depicted in 
Fig. El^b) with the dashed lines. We denote those strate- 
gies with such an error recovery path as efficient coop- 
erator (EC). An EC-typed strategy outperforms I-TFT 
because it costs less by one point in recovering an er- 
ror. This one point may look small but has a signifi- 
cant meaning after thousands of generations. Yet an EC 
strategy can be invaded by even such a trivial strategy in 
Ml as a\DDCC which simply alternate between C and 
Z?, regardless of a. That is, inserted among the strate- 
gies of Ml, an EC- typed strategy does not overwhelm 
Ml and sometimes becomes exterminated. Meanwhile, 
I-TFT under the same condition works so successfully 
that it wins the whole area by defeating all of the Mi 



strategies, including Pavlov, in every realization so long 
as p is small enough. 

Last, some cooperating strategies are triggered to de- 
ceive EC by a single error: At the last step of EC's error 
recovery phase, they defect again, instead of getting back 
to [CC, CC) as desired, and complete the exploiting loop 
[see the dotted line in Fig. [Ub)]. Even if they are trigger 
strategies specialized to defeat EC, from which we simply 
call them efficient trigger (ET), I-TFT suppresses them 
and helps EC to rise [Figs, [^a) and[7tb)]. 

Those two I-TFT strategies are distinguished by how 
they respond to the state {CD, CD) (Table [ITU. Let us 
denote the I-TFT strategy responding with C as I-TFTc 
and that with D as I-TFTd. Comparing a population 
of I-TFTc with that of LTFTd, the former is slightly 
better off, as the latter has a probability of O(e^) that 
both players make errors at the same time, leading to 
{CD, CD) -^ {DD,DD) [Fig.EJa)]. 

If we repeat this SPDG procedure after removing I- 
TFT from fi, some variants of I-TFT play the role of 
protecting EC. They have only one or two different bits 
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FIG. 7: (Color online) SPDG for A/a- (a) AH of the 12 944 
strategies in Q are initially distributed on 500 x 500 lattices. 
They are classified as EC, ET, I-TFT and other miscellaneous 
ones, and the plotted values are averaged over 50 realizations, 
(b) Three representative strategies belonging to EC, ET, and 
l-TFT, respectively, are distributed on a 32 x 32 lattice, (c) 
Averaged results over 10 realizations on 500 x 500 lattices, af- 
ter removing two l-TFT strategies and their six similar vari- 
ants (see text) from fl. There are given p = 0.02 and e = 0.01 
in common. 

from either of l-TFT strategies, but their recurrent states 



do not constitute TFT. Further removing such variants, 
we see that EC strategies are helplessly threatened by 
the parasitic ET [Fig. [7]^c)] . Since ET strategies cannot 
do well with errors, the level of cooperation remains low. 
This comparison clearly shows the crucial role of l-TFT. 
Let us recall Pavlov in comparison with EC: While 
GT and TFT ignored the presence of an error within 
the same species, not to be sucked by anyone, Pavlov 
invented a recovery path (C, D) — * {D, D) -^ (C, C) and 
could be the final winner in Mi. Nevertheless, it is at the 
very point that GT overruns Pavlov. It is therefore not 
surprising that Pavlov fails to enter 17, because so many 
strategies of M2 are willing to exploit its shortsighted 
tolerance. Even though EC devises a more sophisticated 
recovery path than Pavlov's, it is still far from safe. The 
point is that all of their states are recurrent: Even if they 
use every given memory capacity to determine the next 
move, once the patterns are recognized, the opponent can 
get back to the defecting state as many times as it wants. 
However, EC strategies are successful in the long run, 
because they try to cooperate better at some expense of 
security risk. The success of EC crucially depends on the 
existence of such balancing strategies as l-TFT, and is 
thus path-dependent. 



IV. SUMMARY 

In summary, we presented a thorough examination on 
strategies under restrictions of the time horizon in the 
iterated PD game. As the time horizon is enlarged, a 
variety of trajectories to equilibrium become possible, 
but there are still common dynamical patterns. That is, 
the system reaches efficient cooperation through interme- 
diate prevalence of TFT-like strategies, which solve the 
dilemma between security and tolerance by using tran- 
sient states. As l-TFT spends most time as the classical 
TFT which refers only to the opponent's last move, it 
becomes even more likely to win if memory is costly J2l| . 
This gives a clue for understanding how the memory 
could be effectively saved in social interactions and dif- 
ferentiated into other functions. 

The detailed features of our observation in this paper 
may be partially owing to our specific choice of elemen- 
tary payoff values. However, we believe that the suc- 
cessful strategies such as l-TFT and dynamical patterns 
between them have good reasons to be remarkable in a 
more general context of the evolutionary PD game. 
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